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ABSTRACT 

We present a further development of a method for accelerating the calculation of CMB power 
spectra, matter power spectra and likelihood functions for use in cosmological Bayesian infer- 
ence. The algorithm, called CosmoNet, is based on training a multilayer perceptron neural 
network. We demonstrate the capabilities of CosmoNet by computing CMB power spectra 
(up to f = 2000) and matter transfer functions over a hypercube in parameter space encom- 
passing the 4o confidence region of a selection of CMB (WMAP -i- high resolution exper- 
iments) and large scale structure surveys (2dF and SDSS). We work in the framework of a 
generic 7 parameter non-flat cosmology. Additionally we use CosmoNet to compute the 
WMAP 3-year, 2dF and SDSS likelihoods over the same region. We find that the average er- 
ror in the power spectra is typically well below cosmic variance for spectra, and experimental 
likelihoods calculated to within a fraction of a log unit. We demonstrate that marginalised pos- 
teriors generated with CosmoNet spectra agree to within a few percent of those generated by 
C AMB parallelised over 4 CPUs, but are obtained 2-3 times faster on just a single processor 
Furthermore posteriors generated directly via COSMONet likelihoods can be obtained in less 
than 30 minutes on a single processor, corresponding to a speed up of a factor of ^ yi. We 
also demonstrate the capabilities of CosmoNet by extending the CMB power spectra and 
matter transfer function training to a more generic 10 parameter cosmological model, includ- 
ing tensor modes, a varying equation of state of dark energy and massive neutrinos. Finally 
we demonstrate that using CosmoNet likelihoods directly, the sampling st rategy adopted 
by COSMOMC is highly sub-optimal. We find the generic Bayesys sampler (ISkiUing|l2004 
sampler to be a further ~ 10 times faster, yielding 20,000 post burn-in samples in our 7 param- 
eter model in just 3 minutes on a single CPU. CosmoNet and interfaces to both CosMoMC 
and Bayesys are publically available at www .mrao . cam. ac .uk/sof tware/cosmonet. 

Key words: cosmology: cosmic microwave background - methods: data analysis - methods: 
statistical. 



1 INTRODUCTION 

Bayesian inference in cosmology is normally carried out using 
sampling based methods as now required by the dimensionality of 
the models and increasingly high-precision of the data sets. Typ- 
ically one requires the calculation of theoretical temperature and 
polarisation CMB power spectra Cf^ , C™, Cf^ and C^^ and/or 
the matter power sp ectrum /'(A;) using codes such as CM Bfast 
JSeliak & Zaldarriaga|[l996 ) or CAMB ( 'Lewis et al. "2000"). These 
codes typically require of order 10 sees for spatially-flat models 
and 50 sees for non-flat models on a 2 GHz CPU. This approach is 
therefore computationally demanding, but does have the advantage 
that it is sim ple to generalise if one wishes to include new physics. 
CoSMOMC iUewis & Bridlel2002|) currently represents the state of 
the art in cosmological Markov Chain Monte Carlo (MCMC) sam- 
pling and employs a number of strategies to improve performance, 
such as a division of the parameter space into 'slow' parameters 



(which determine the evolution of structure) and 'fast' parameters 
which determine the primordial power spectrum. Nonetheless, the 
technique is still computationally expensive. 

A number of examples exist in the literature of methods re- 
liant on generating (to some degree or other) grids of models, 
within which various interpolations are made to compute observ- 
able spectra at arbitrary parameter values. One example is DASH 
jKaplinghat et al.ll2002f) which requires a considerable investment 
of some 40 hours to generate a grid of transfer functions which can 
then be used to generate Ci spectra for a given parameter combina- 
tion about 30 times faster than CAMB. 

I Jimenez et alj j2004h have built a less demanding method 
around the novel idea of transformation into th e mostly uncor- 
related physical parameterisation introduced by iKosowskv et al.l 
( l2002h . Since the C^'s have a simple dependence on the input pa- 
rameters they are then relatively easy to model. The algorithm, 
known as CMBWARP then uses polynomials to fit the spectra in 
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which the polynomial coefficients are tied to the spectra at some, 
single point in the parameter space. This allows spectra to be gen- 
erated ~ 3000 times faster than CAMB. Of course this method 
suffers from the drawback that the single model about which the 
polynomial fit is specified must be chosen carefully to lie close to 
the centre of the posterior distribution as accuracy decreases away 
from this point. Within a 3a region around the chosen model they 
estimate it gives better than 1 % accuracy. 

The advent of larger datasets have meant the time spent calcu- 
lating model likelihoods is rapidly approachin g the time necessary 
to generate the theoretical spectra. CMBfit ( Sandvik et al.ll2004h 
proposes to remove the step of determining spectra altogether by 
providing a semi-analytic fit directly to the WMAP likelihood as 
a function of input cosmological parameters. Given the ubiquity of 
WMAP data in cosmological analyses the drawback of being tied to 
a single experiment is however not as limiting as one might think. 

The methods just described, although useful, lack general ap- 
plicability over a range of theoretical spectra and datasets. We have 
been motivated to generate a new method that can be applied, al- 
most blindly to the problem of cosmological inference in order to 
remove the two largest bottlenecks of theoretical spectra generation 
and likelihood evaluation. Previously Fendt & Wandelt ( 2007) built 
a robust new method based on machine-learning called PiCO. Their 
method requires the assembly of ~ 10^ samples over the parameter 
space drawn uniformly from a desired region that could encompass 
any confidence region of a given experiment. This 'training set' is 
compressed via a principal component analysis (using Karhunen- 
Loeve eigenmodes) which typically results in a reduction in the 
dimensionality of the training set by a factor of two. The training 
set is used to divide the parameter space into 100) regions using 
fc-means clustering (see e.g. MacKay 1997) with the aim of each 
cluster encompassing a region of parameter space over which the 
power spectra vary equally. A polynomial fit is then used over each 
cluster providing a local interpolation of the power spectra within 
the cluster as a function of cosmological parameters. Crucially, the 
method fails to model the spectra accurately over the entire param- 
eter space, hence the need for cluster division and thus making the 
algorithm difficult to extend. 

Both Pico and CMBWarp provide similar improvements in 
efficiency, but PiCO is an order of magnitude more accurate than 
both DASH and CMBWARP. It is generic enough to be extended 
to any observable spectra and is flexible enough to allow prediction 
of likelihood values, thus incorporating the benefits made by the 
CMBfit code. Given t he cur rent speed of the WMAP 3-year like- 
lihood I Hinshaw et al.l ( |2006|) . WMAP3] code this particular facet 
of the method will become ext remely important in future analyses. 

We previously presented jAuld et al .120071) a new method that 
combined all of the advantages of PiCO but in a simpler and more 
readily expandable form by training neural networks. The resulting 
algorithm is called CosmoNet and has some considerable addi- 
tional benefits in terms of the scalability, accuracy and computa- 
tional memory requirements. In addition, the training method we 
employ is sufficiently general and simple to apply that it allows the 
end user to generate their own trained nets over any chosen cos- 
mological model. In this paper we extend the method to include 
more generic (non-flat) cosmological models, interpolations over 
matter transfer functions and two large scale structure likelihoods 
in addition to the suite of CMB power spectra and the WMAP3 
likelihood. Additionally we extend the I range of our CMB spectra 
interpolation to i'm^x = 2000. In Sec. |2] we briefly describe neu- 
ral networks. In Sec.[3]we describe the CosmoNet algorithm and 
training efficiency. In Sec. |4] we present cosmological parameter 



estimates using the trained networks implemented as CosmoNet. 
In Sec. [5] we apply the CosmoNet training algorithm to a 10 pa- 
rameter cosmological model, training the CMB power spectra and 
matter transfer functions and producing parameter estimates. Our 
discussions and conclusions are presented in Sec.|6] 



2 NEURAL NETWORK INTERPOLATION 

Neural networks are a methodology for computing loosely based 
around the structures found in animal brains. They consist of a 
number of interconnected processors called neurons. The neurons 
process information separately and pass information to one another 
via connections. Well-designed networks are able to Team' from 
training data and are able to make predictions when presented with 
new, possible incomplete, information. For an introd uction to the 
scienc e of neural networks the reader is directed to iBailer-Joned 
l l200lh . 

2.1 Multilayer perceptron networks 

The perceptron jRosenblat^ Il958l) is the simplest type of feed- 
forward neural network. It maps an input vector x £ SR" to a scalar 
output /(x; w, 9) via 

/(x;w,e) = x;vfi^/ + e, (1) 

where {w,} and 6 are the parameters of the perceptron, called the 
'weights' and 'bias' respectively. 

Multilayer perceptron neural networks (MLPs) are a type of 
feed-forward network composed of a number of ordered layers of 
perceptron neurons that pass scalar messages from one layer to the 
next. In this paper, we will work with 3-layer MLPs only. They 
consist of an input layer, a hidden layer and an output layer (Fig.[Tll. 
In such a network, the outputs of the nodes in the hidden and output 
layers take the form 

hidden layer: Z,^- = gd) (/]''); = ^w^.^x, +ey', (2) 

outputlayer: v/ = g(2)(y;.(2)); = ^v^lf ^> + ' , 0) 

j 

where the index / runs over input nodes, j runs over hidden nodes 
and i runs over output nodes. The functions g''' and g'^^ are called 
activation functions and are chosen to be bounded, smooth and 
monotonic. In this paper, we use g'''(jc) = tanhx and g'2'(x) = x, 
where the non-linear nature of the former is a key ingredient in 
constructing a viable network. 

The weights w and biases 6 are the quantities we wish to de- 
termine, which we denote collectively by a. As these parameters 
vary, a very wide range of non-linear mappings between the inputs 
and outputs are possible. In fact, according to a 'universal approx- 
imation theorem' (Leshno et al. 1993), a standard multilayer feed- 
forward network with a locally bounded piecewise continuous ac- 
tivation function can approximate any continuous function to any 
degree of accuracy if (and only if) the network's activation function 
is not a polynomial. This result applies when activation functions 
are chosen apriori and held fixed as a varies. Accuracy increases 
with the number in the hidden layer and the above theorem tells us 
we can always choose sufficient hidden nodes to produce any accu- 
racy. Since the mapping from cosmological parameter space to the 
space of CMB power spectra (and WMAP3 likelihood) is known to 
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Figure 1. An example of a 3-layer neural network with seven input nodes, 
3 nodes in the hidden layer and five output nodes. Each line represents one 
weight. 

be continuous, a 3-layer MLP with an appropriate choice of activa- 
tion function is an excellent candidate model for the replacement of 
the forward model provided by the CAMB package (and WMAP3 
likelihood code). 

The activation functions act as basic building blocks of non- 
linearity in a neural network model and should be as simple as 
possible. Additionally, the MemSys routines used in training (de- 
scribed below) require derivative information and so they should 
be differentiable. The universal approximation theorem thus moti- 
vates us to choose a monotonic (for simplicity), bounded and dif- 
ferentiable function that is not a polynomial and we choose the 
tanh function. Of course, this could be replaced by another such 
function, such as the sigmoid function, but the interpolation results 
would be almost identical. 

2.2 Network training 

Let us consider building an empirical model of the CAMB map- 
ping using a 3-layer MLP as described above (a model of the differ- 
ent likelihood codes can be constructed in an analogous manner). 
The number of nodes in the input layer will correspond to the num- 
ber of cosmological parameters, and the number in the output layer 
will be the number of uninterpolated Q values output by CAMB. 
A set of training data D = {x'^^t^*^)} is provided by CAMB (the 
precise form of which is described later) and the problem now re- 
duces to choosing the appropriate weights and biases of the neural 
network that best fit this training data. 

As the CAMB mapping is exact, this is a deterministic prob- 
lem, not a probabilistic one. We therefore wish to choose network 
parameters a that minimise the 'error' term on the training 

set given by 

z'W = ii:ii'P'-^'-(xW;a)l'. (4) 

k i 

This is, however, a highly non-linear, multi-modal function in many 
dimensions whose optimisation poses a non-trivial problem. De- 
spite the deterministic nature of the problem we use an extension 
of a Bayesian method provided by the MemSys package (Gull & 
Skilling 1999). 

The MemSys algorithm considers the parameters a of 
the network to be probabilistic variables with prior probabil- 
ity distribution proportional to exp(— a 5'(a)), where S(a) is 
the positive-negative ent ropy functional l lGull&Skillin3ll999l : 
iHobson & Lasenb-vlll998h and a is considered a hyper-parameter 



of the prior. The variable a sets the scale over which variations in 
a are expected, and is chosen to maximise its marginal posterior 
probability. Its value is inversely proportional to the standard devi- 
ation of the prior. For fixed a, the log-posterior is thus proportional 
to — + 0c5(a). For each choice of a there is a solution a that 
maximises the posterior. As a varies, the set of solutions a is called 
the 'maximum-entropy trajectory'. We wish to find the maximum 
of — which is the solution at the end of the trajectory where 
a = 0. It is difficult to recover results for a °° (for large a the 
solution is found at the maximum of the prior) when starting with a 
result that lies far from the trajectory. Thus for practical purposes, it 
is best to start from the point on the trajectory at a = oo and iterate 
a downwards until either a Bayesian a is achieved, or in our deter- 
ministic case, a is sufficiently small that the posterior is dominated 
by X^. 

MemSys performs the algorithm using conjugate gradients 
at each step to converge to the maximum-entropy trajectory. The 
required matrix of second derivatives of is approximated using 
vector routines only. This avoids the need for the 0{N^) operations 
required to perform exact calculations, that would be impractical 
for large problems. The application of MemSys to the problem of 
network training allows for the fast efficient training of relatively 
large network structures on large data sets that would otherwise 
be difficult to perform in a useful time-frame. The MemSys algo- 
rithms are described in greater detail in (Gull & Skilling 1999). 



3 RESULTS 

We will demonstrate the approach of neural network training to 
cosmology by attempting to replace the CAMB generator for the 
computation of CMB power spectra up to f = 2000, in both tem- 
perature and polarisation Cj^, C™, Cf^ and the matter power spec- 
trum P{k). In general CAMB does not compute the CMB spectra 
Ce values for all I, instead it computes a set of 60 values (up to 
£ = 2000) chosen at appropriately spaced intervals to ensure cov- 
erage over the main acoustic peaks. A cubic spline interpolation 
is then carried out internally in CAMB to produce a full compli- 
ment of Q's at each £ to compare with the data. In the case of 
flat geometries these chosen £ values are predetermined and fixed, 
but in non-flat cases they shift, as the features of the acoustic peak 
structure do with flj. CAMB choses the most appropriate £ set to 
ensure the main features are covered. This creates a difficulty for 
our training algorithm, as one would normally wish to learn how a 
set of observables changes with input parameters. In this case the 
observables are actually changing. In fact, as we shall demonstrate, 
if we fix the set of £'s to those used for flat geometries, although we 
see some degredation in the accuracy of the spectra we see minimal 
impact in the marginalised posteriors. 

In addition to the CMB power spectra, CAMB also generates 
matter power spectra for comparison with large scale structure data. 
We chose not to train over the spectrum directly, but instead trained 
the matter transfer function T{k) which can be used to generate 
P{k) given the primordial spectrum. This has the advantage of al- 
lowing us to evaluate a number of derived parameters such as the 
age of the universe and Og without the need for further trained net- 
works 0. Since the acoustic peak structures that appear in the CMB 
also appear in the matter spectra and transfer functions, CAMB 

' A future goal is to train networks also over the transfer functions for CMB 
power spectra to achieve the same generality, but this involves substantial 
additional complications and will be explored in a subsequent publication. 
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also likes to set appropriate scales on which to generate the spec- 
trum in non-flat cosmologies. In the same manner in which we dealt 
with the CMB spectra we have trained the networks over a prede- 
termined, but sufficiently dense set of fixed k values (for example 
C AMB normally generates the function at ~ 75 such values; in this 
interpolation we have used '--^ 175). Again this approximation has 
led to minimal impact on the posteriors obtained. 

Current likelihood codes, such as the newly released WMAP3, 
now require similar computation times to the generation of spectra. 
This trend is not likely to improve in the future as larger datasets 
come on stream. Thus it is crucial if we are to improve the effi- 
ciency of cosmological inference to have a combined approach for 
the spectra generation as well as likelihoods. In this paper we have 
exploited the same network training algorithm used for spectra to 
predict WMAP likelihoods as well as large scale structure likeli- 
hoods from the 2dF and SDSS surveys. Replacement of these codes 
and CAMB thus alleviates both major bottlenecks in cosmological 
Bayesian inference. 



3.1 Training Data 

In order to replace the CAMB package in codes such as Cos- 
MOMC we need to decide upon an appropriate region within which 
to train the networks. Inside this region the regression codes re- 
liant on the trained networks would predict the appropriate spec- 
tra and outside this region CAMB would need to be called in 
the normal fashion. Choosing too large a region will lead to 
longer training periods and a reduction in the interpolation ac- 
curacy. Too small and CAMB would be called so often by the 
MCMC sampler as to render any performance increase negligi- 
ble. Training was thus carried out by uniformly sampling a 4cT 
confidence region as determined using a typical mixture of CMB 
and large scale structure experim ents: WMAP3 + higher resolu- 
tion CMB observatiori s (ACBAR:IKuo et alJ|2004 BOOMERang ; 
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20061 : IJones et al.l 120061: iMontrov et alj l200d : 



CBI; Readhead "etai]|20 04: Readh ead et alj |2004| and the VSA; 



Dickinson et al. 2004) and g alaxy su rveys; 2dF; jPercival et al.l 
200 ih and SDSS; (Te gmark et alj2004h . 

To test the approach we performed training over a non-flat cos- 
mology parameterised by: {Q.\,h^, d^^^l-P', Q-i^, 9, t, ns. As). The 
physical parameters (flb''^^ ^dmh^^ 6- "t) were converted back to 
cosmological parameters ilcdm> ^^0> Zre) and used as input to 
CAMB to produce the training set of CMB power spectra and mat- 
ter transfer functions. Ultimately we aim to train networks over a 
sufficiently general cosmological model (see Sec.O so that the user 
could perform any analysis over a subset of the trained parameters, 
setting unwanted variables to whatever fixed val ue they choose. In 
this way the flat model computed previously in I Auld et alj j2007l) 
is superceded by the results of this paper. 



3.2 Training Efficiency 

To investigate training efficiency with training data set size and 
number of hidden network nodes, we evaluate the testing eiTor as 
the maximum entropy trajectory is traversed. The training was con- 
ducted on a single 2.2 GHz processor. Asymptotic behaviour was 
observed. In particular the testing error appears to settle down, af- 
ter a period of logarithmic decrease. For a network of this size with 
this amount of data it appears disproportionate to train past ~ 100 
hours, indeed adequate results can be obtained in just a few hours. 
It is expected some tiny increase in accuracy could be achieved for 
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Figure 2. Evolution of test errors during training for networks with different 
numbers in the hidden layer Too few hidden nodes reduces accuracy, and 
too many slow training time. 





Training Data 


Hidden Nodes 


CMB Spectra 


2000 


50 


MPT Function 


2000 


50 


Likelihoods 


3000 


50 



Table 1. The optimal number of data, and hidden nodes in the neural net- 
work training. 



much longer training periods. However, this would be dispropor- 
tionate, unless there is a significant error propagated through to the 
parameter constraints generated by these networks. 

For each of the neural networks, training was then performed 
with 5000 training data but using different numbers of hidden 
nodes. Fig. |2l shows the testing error evolution for networks with 
10, 25, 50, 100 and 250 nodes in the hidden layer, for the Cl^ 
spectrum. It can be seen that increasing the numbers of nodes past 
50 does not increase accuracy, but does increase the training time. 
Similar experiments were then performed to determine the optimal 
size of training set. Again it was observed that for each neural net- 
work, increasing the training set size past a certain value did not 
increase accuracy, but did slow training. The optimal numbers of 
hidden nodes and training set sizes obtained for all networks are 
displayed in Table[T] 

We note that in iHabib et alj J2007l) sub-percentage errors on 
the CMB spectra are achieved for a 6 parameter flat ACDM model 
over a much larger region of parameter space, using a Gaussian Pro- 
cess with just 128 training data. In this paper we have found that 
of order 1000 training data produce optimal results (for non-flat 
models) and we proceed on this basis. However, the reader should 
note that tests showed that CosmoNet also generated usable ac- 
curacies using lOO's rather than lOOO's of training data. We do not 
consider the use of more training data as a large overhead, however 
as the data need only be generated once, and CosmoNet training 
time scales linearly with data s et size. We believe that the method 
presented in lHabib et alj ( l2007b would become more accurate with 
more training data, but that training time may suffer, as the inver- 
sion of a matrix is needed that requires of order the cube of the data 
set size operations. 

3.3 Training Results 

Networks were trained on the optimal numbers of hidden nodes 
and training data for ~ 100 hours. The accuracy of each interpo- 
lated spectrum and likelihood was then evaluated on a test set of 
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10"* models drawn uniformly from the appropriate parameter hy- 
percubes (see Fig.O. As discussed we would expect some error to 
be introduced in our interpolations for non-flat models owing to our 
use of a fixed (flat) set of I values. We find a mean error of ~ 5% of 
cosmic variance as compared to the ~ 1% erro r found in our pre- 
vious analysis of flat models jAuld et ^l2007h which of course is 
still well below any possible experimental error. More importantly 
the 99 percentile errors are all comfortably below cosmic variance, 
showing that the networks will be usable even when analysing data 
from even a perfect experiment. A loss in accuracy is also observed 
for the matter transfer interpolation. Here we find a mean error of 
less than 0.2 %, representing a considerably larger drop in accu- 
racy than with the CMB spectra. However 0.2 % still represents 
a small inaccuracy given the quality of current large scale struc- 
ture datasets. The likelihood test set correlation coefficients were 
all > 0.9999 with errors of less than 0.2 units close to the peak 
though with slightly larger deviations away from it. 



Method 


(i) 


(ii) 


(iii) 


(iv) 


No. chains 


4 


4 


4 


4 


No. CPU/chain 


4 


1 


1 


1 


Run time 


> 16 hrs. 


~ 8 hrs. 


~ 30 mins. 


~ 3 mins. 



Table 2. Time required to gather ~ 20,000 post burn-in MCMC samples us- 
ing different combinations of CAMB, CosmoNet, the experimental like- 
lihood codes and Bayesys. Note that CAMB is parallelised in method (i) 
over 4 CPUs per chain, if a single processor were used these timings would 
approach 4 x that quoted. 



The reader should also note from both our previous work 
jAuld et alj|2007l) . and that of the 10 parameter models below, that 
the timings for parameter estimation are roughly independent of 
the number of model parameters used. Our regression algorithm is 
indifferent to the complexity of the input cosmology. 



4 APPLICATION TO COSMOLOGICAL PARAMETER 
ESTIMATION 

To illustrate the usefulness of CosmoNet in cosmological infer- 
ence we perform an analysis of the WMAP 3-year TT, TE, EE 
data and 2dF and SDSS surveys using COSMOMC in three sep- 
arate ways: (i) using CAMB power spectra and the WMAP3, 2dF 
and SDSS likelihood codes; (ii) using CosmoNet power spectra 
and the WMAP3, 2dF and SDSS likelihood codes; (iii) using the 
CosmoNet likelihood nets alone and (iv) using CosmoNet like- 
lihoods with the Bayesys sampler. The resulting marginalised pa- 
rameter constraints using each method are shown in Fig.|4l and are 
clearly very similar with mean parameter values differing by less 
than 1 % of the value computed using the standard approach (i). 

To determine the speed up introduced by using CosmoNet 
spectra and likelihood interpolations, 4 parallel MCMC chains 
were run on Intel Itanium 2 processors at the COSMOS cluster 
(SGI Altix 3700) at DAMTP, Cambridge using the basic Cos- 
MOMC sampling package. The time required to generate ~ 20000 
post burn-in MCMC samples was recorded using methods (i)-(iv) 
described above 0. The results (see Table |2]l illustrate that using 
CosmoNet spectra one can obtain reasonable posterior distribu- 
tions in roughly 8 hours on a single CPU per chain whereas using 
CAMB not only took between 2-3 times longer but required 3 ad- 
ditional CPUs per chain. Using CosmoNets likelihood interpola- 
tions alone produced dramatic time savings, with accurate results 
in roughly 30 minutes. 

Cosmologists have invested considerable time in developing 
samplers that have as efficient a proposal distribution as possible. In 
CoSMOMC, the multi-variate Gaussian proposal distribution has a 
covariance matrix that is regularly updated using statistics from the 
samples gathered up to that point. This does lead to a higher accep- 
tance rate and a corresponding lower number of likelihood evalu- 
ations, but is coinputationally intensive in its own right. However, 
when using a CosmoNet likelihoods directly there is no need to 
reduce the number of likelihood calls. The process of updating the 
proposal distribution slows the task considerably, as can be seen 
when comparing ti mes with the e fficient, yet likelihood intensive 
Bayesys sampler jSkilUndliooi) via method (iv), computing the 
relevent posteriors in just 3 minutes. 

^ Note that CAMB was in fact parallelised over 3 additional processors per 
chain, therefore totalling 16 CPUs 



5 TOWARDS A 10 DIMENSIONAL PARAMETER SPACE 

In lAuld et al.l ( |2007|) we presented trained networks capable of re- 
placing CAMB and experimental likelihood codes for a 6 param- 
eter flat cosmology. In this paper we have shown that this method 
is easily extendable to the more arduous computational demands 
of a non-flat cosmology. To test the scaleability to even higher di- 
mensions we now examine a 10 dimensional cosmology including, 
in addition to the basic 7 given in Sec. [3] the equation of state of 
dark energy, w, the neutrino mass fraction, v and the tensor to scalar 
ratio, r. 

Training efficiency was examined as per the 7 parameter 
model (see Table|3) and it was found that little increase in the quan- 
tity of training data or training time was needed for optimum results 
for the CMB power spectra and matter transfer function. An accu- 
rate tracer of the scaling of the training algorithm is given by the 
number of network hidden nodes as this determines the amount of 
computational resource required. In this case we find that at worst a 
50% increase in the number of hidden nodes is needed for a ~ 40% 
rise in the number of parameter dimensions (going from 7 to 10). 
This represents slightly more than a linear rise in resources and 
demonstrates our algorithm is easily scaleable to even higher di- 
mensions if necessary. The accuracy of interpolated CMB spectra 
and matter transfer functions did not decrease at all when com- 
pared to the 7 parameter interpolations (see Fig.jSj, suggesting that 
the largest source of error in our method is introduced by fixing 
the set of I and k values at their flat positions. Providing an ac- 
curate interpolation for the three likelihood surfaces was however 
problematic. Using the order of 1000 training data provides very 
sparse coverage of the 10 dimensional hypercube. For CMB power 
spectra and the matter transfer function this is not a problem since 
they vary smoothly over a limited dynamical range. For likelihoods 
however, the dynamical range is much larger and to obtain parame- 
ter constraints we need very good accuracy within a region having 
a volume of the order of that of a la hypersphere. This hypersphere 
has a volume over 400,000 times less than the 10 parameter 4a hy- 
percube over which we performed the training. This suggests that 
much more training data would be required. A potential solution to 
this problem would be to train likelihood networks only in some re- 
gion over which the likelihood value was within (say) 50 log units 
of the peak value. The shape of this region could be determined 
by a classification net that returns an output that predicts whether a 
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(b) 




(d) 



Figure 3. Comparison of the performance of CosmoNet versus CAMB for TT, TE and EE power spectra (a-c) and the matter transfer function (d) in a 7 
parameter non-flat cosmology. The CMB plots show the average error together with the 95 and 99 percentiles in units of cosmic variance. The transfer function 
is shown with % eiTor 
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Figure 4. The one-dimensional marginalised posteriors on the cosmological parameters within the 7-parameter non-flat cosmology comparing: CAMB 
power-spectra and WMAP3, 2dF and SDSS likelihoods (red) wifli (a) CosmoNet power spectra and WMAP3, 2dF and SDSS likelihoods (black) and (b) 
CosmoNet likelihoods (black). 





Training Data 


Hidden Nodes 


CMB Spectra 


2000 


75 


MPT Function 


2000 


50 



Table 3. The required number of data, and hidden nodes in the neural net- 
work training for optimum performance in a 10 dimensional parameterisa- 
tion. 



Method 


(i) 


(ii) 


No. chains 


4 


4 


No. CPU/chain 


4 


1 


Run time 


> 20 hours 


~ 8 hours 



Table 4. Time required to gather ~ 20,000 post burn-in MCMC samples 
using different combinations of CAMB, CosmoNet and the experimen- 
tal likelihood codes. Note that CAMB is parallelised in method (i) over 4 
CPUs per chain, if a single processor were used these timings would ap- 
proach 4 X that quoted. 



point lies inside or outside the desired region. This method will be 
explored in a future publication. 

Marginalised posteriors obtained from CosmoNet spectra 
were found to be accurate to within a few % of those computed 
via CAMB (see Fig.|6](, and took roughly 8 hours on a single CPUs 
per chain to calculate (see Table|4j. CAMB however required more 
than 20 hours of computational time with parallelisation over a fur- 
ther 4 CPUs per chain. 



6 DISCUSSION AND CONCLUSIONS 

We have extended our method of accelerating the estimation of 
CMB and matter power transfer functions, WMAP, 2dF and SDSS 
likelihood evaluations based on training a multilayer perceptron 
neural network to more generic non-flat cosmologies. We have 
demonstrated that the use of trained neural networks such as Cos- 
moNet can replace the bulk of computational effort required by 
cosmological evolution codes such as CAMB and experimental 
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likelihood codes, like that of WMAP3. CosmoNet shares all the 
improvements made by PiCO in terms of accuracy on both spectral 
interpolation and parameter constraints, but has now been scaled to 
a more generic 7 parameter non-flat cosmology. Furthermore, al- 
though the training procedure requires the optimisation of a highly 
non-linear multi-dimensional function, the end user simply runs the 
MemSys package essentially as a 'black box'. This means Cos- 
MONet remains simple and efficient to train. We have found the 
biggest bottleneck in the procedure to be the generation of training 
and testing data using CAMB. Increasing the model complexity 
had limited impact on the necessary training time (all models tak- 
ing about 100 hours to train) or interpolation accuracy. Moreover 
the increase in network hidden nodes was at worst linear with in- 
creasing parameter space. Thus we expect few resource difficulties 
in extending this method to even higher dimensions. 

Although accurate likelihood interpolations in the 10 dimen- 
sional model interpolation are currently beyond the reach of our 
method, the corresponding CMB spectra and matter transfer func- 
tions are sufficiently accurate allowing a speed up over the standard 
performance of COSMOMC. 

Finally, replacing the CoSMOMC sampler entirely with 
Bayesys can produce further dramatic time savings of a factor of 
~ 10, computing ~ 20, 000 post burn-in samples in a few minutes 
on a single CPU. 



(e) 

Figure 5. Comparison of the performance of CosmoNet vensus CAMB 
for XT, TE, EE and BB power spectra (a-d) and matter transfer function 
(d) in a 10 parameter non-flat cosmological model. The CMB plots show 
the average error together with the 95 and 99 percentiles in units of cosmic 
variance. The transfer function is shown with % error 
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Figure 6. The one-dimensional mai'ginalised posteriors on the cosmolog- 
ical parameters within the 10-parameter non-fiat ACDM model including 
tensor modes, varying equation of state of dark energy and massive neutri- 
nos comparing: CAMB power-spectra and WMAP3, 2dF and SDSS likeli- 
hoods (red) with CosmoNet power spectra and WMAP3, 2dF and SDSS 
likelihoods (black). 
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